This data is a large-scale speaker recognition dataset collected 'in the wild'.
The dataset contains more than 130,000 utterances from 1,000 Chinese
celebrities, and covers 11 different genres in real world.
All the audio files are coded as single channel and sampled at 16kHz with 16-bit precision.

<p>
The data collection process was organized by the Center for Speech and Language Technologies, Tsinghua University.
It was also funded by the National Natural Science Foundation of China No. 61633013, 
and the Postdoctoral Science Foundation of China No. 2018M640133.
<p>

You can cite the data using the following BibTeX entry:
<pre>
@misc{fan2019cnceleb,
  title={CN-CELEB: a challenging Chinese speaker recognition dataset},
  author={Yue Fan and Jiawen Kang and Lantian Li and Kaicheng Li and Haolin Chen and Sitong Cheng and Pengyuan Zhang and Ziya Zhou and Yunqi Cai and Dong Wang},
  year={2019},
  eprint={1911.01799},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}
</pre>

<h3>PEOPLE</h3>
<p>Dong Wang, Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, 
Pengyuan Zhang, Ziya Zhou, Yunqi Cai</p>

<h3>CONTACTOR</h3>
<ul>
<li/> Dong Wang: 
<a href="mailto:wangdong99@mails.tsinghua.edu.cn">wangdong99@mails.tsinghua.edu.cn</a>

<li/> Lantian Li:
<a href="mailto:lilt@cslt.org">lilt@cslt.org</a>

<li/> Yue Fan:
<a href="mailto:fanyue@cslt.org">fanyue@cslt.org</a>

<li/> Jiawen Kang:
<a href="mailto:kangjw@cslt.org">kangjw@cslt.org</a>

<li/> Zhiyuan Tang:
<a href="mailto:tangzy@cslt.org">tangzy@cslt.org</a>
</ul>

<p>Address: ROOM 1-303, BLDG FIT, CSLT, Tsinghua University</p>
<p>Homepage: <a href="http://cslt.org">http://cslt.org</a> or <a href="http://cslt.riit.tsinghua.edu.cn">http://cslt.riit.tsinghua.edu.<wbr>cn</a></p>
